Method of operating a barge-in dialogue system

ABSTRACT

A method is described for multi-user operation of a barge-in dialogue system ( 1 ). The dialogue system comprises a front-end computer unit ( 2 ) with a plurality of access channels ( 6 ) for the users and a plurality of servers ( 18, 19, 20, 21 ) having each a number of speech processing units ( 22 ). Each of the speech processing units ( 22 ) comprises a speech activity detector ( 23 ) and a speech recognition unit ( 24 ). During a dialogue between the system and a user, a new speech processing unit ( 22 ) is repeatedly assigned at various specific times to the user-deployed access channel ( 6 ) so as to achieve as uniform a utilization of the servers ( 18, 19, 20, 21 ) as possible. The speech activity detector ( 23 ) detects an in-coming speech signal on the access channel ( 6 ) to which channel the speech processing unit ( 22 ) is assigned at this time, and activates the speech recognition unit ( 24 ). It addition, a corresponding barge-in dialogue system ( 1 ) is described.

The invention relates to a method of operating a barge-in dialoguesystem for parallel use by a plurality of users i.e. for use inso-termed “multi-user operation”. In addition, the invention relates toa corresponding barge-in dialogue system. Barge-in dialogue systems aremeant to be understood as speech dialogue systems which make it possiblefor a user to interrupt a running system output.

Speech dialogue systems which communicate with a user while using speechrecognition and/or speech output devices have been known for a longtime. An example of this are automatic telephone answering machines andenquiry systems as they have meanwhile been used more particularly byseveral larger firms and offices to provide a caller with the desiredinformation in the fastest and most comfortable way possible or connecthim/her to a location which is appropriate for the specific desires ofthe caller. Further examples of this are automatic directory enquirysystems, automatic timetable systems, information services With generalinformation on events for a certain region, for example cinema andtheater programs, or also combinations of the various enquiry systems.Such speech-controlled automatic dialogue systems are often referred toas voice portals or language applications.

In order to be of service to various users simultaneously, the dialoguesystem accordingly has to comprise a plurality of access channels forthe users. These may be access channels for connection to a suitableterminal of the user, which comprises an acoustic user interface with amicrophone for the user to input speech commands to the dialogue systemand a loudspeaker, headphones or the like for issuing acoustic systemoutputs to the user. For example, the terminal may be a telephone, amobile radio device or a PC of the user and the access channels may becorresponding telephone and/or Internet connections. A stationarydialogue system, for example a terminal at a public place such as arailway station, airport, museum etc., the access channels may be, forexample, headsets or the like with which the users can communicate withthe terminal. Furthermore, the speech dialogue system usually comprisesfor each access channel a dialogue control in the form of a softwaremodule. This dialogue control controls the operation of a dialogue witha user via the respective access channel and causes, for example atcertain positions in the dialogue operation, a system output to be givento the user via the respective access channel.

The system output—generally also called prompt—may be, for example, arequest for input to the user or information requested from the user. Togenerate such an acoustic prompt, the speech dialogue system needs tohave a suitable speech output device, for example a text-to-speechconverter which converts text information of the dialogue system intospeech for the user and outputs same over the access channel. The speechoutput device may, however, also have ready-made stored sound fileswhich are played back to the user at an appropriate time. As a rule, thespeech dialogue system has for each access channel its own speech outputdevice. However, it is also possible for more access channels to share acommon speech output device.

To recognize a speech signal coming in on an access channel i.e. anarbitrary speech utterance of the user such as a word, a wordcombination or a sentence, and to be able to react to this accordingly,a speech recognition unit—usually a software module—is utilized. Theaudio data of the speech signal are conveyed to the speech recognitionunit for this purpose and the speech recognition unit delivers theresult of the recognition, for example, to the dialogue control.

Since speech recognition requires a relatively large computer, dialoguesystems that handle a plurality of users are often physically built upfrom a plurality of computer units. The system then comprises one ormore so-called front-end computer units having a plurality of accesschannels colts). A front-end computer unit is usually the computer unitof the system that communicates directly with the users via the accesschannels. The dialogue controls fixedly assigned to the access channelsare usually located on the respective front-end computer unit. Also thespeech output devices may be located on the front-end computer unit. Thespeech recognition unit or speech recognition units, on the other hand,are located on a separate computer unit, to be called server in thefollowing, which can render the necessary computing power available forthe speech recognition With larger systems it is customary in practiceto utilize a plurality of servers in the system, one or more speechrecognition units being implemented in each server.

The dialogue control responsible for the respective access channel canthen select a free speech recognition unit at the light time, forexample, at the end of a prompt, and assign it to the respective accesschannel so that an incoming speech signal of the user can immediately beprocessed and recognized. It is desirable for the selection of one ofthe available speech recognition units to be effected such that theservers accommodating the speech recognition units are evenly loaded. Asa result, optimum use of the capacity of the system and thus a maximumprocessing speed can be achieved. Such procedure is naturally onlypossible if the dialogue system or the dialogue control respectively,knows beforehand when a speech recognition unit is required for therespective access channel. This does not give problems with dialoguesystems that allow the user to input only at certain times, that is tosay, after a prompt has ended. Such systems, however, are relativelyunnatural as regards their behavior towards the user. As is known, usersare often inclined to already respond before the dialogue system hasfinished a request for input. This is especially when the user alreadyexactly knows or suspects what input the system requires him to give andwhich possibilities are available to him at this part of the dialogue.Such interruption of the system output furthermore occurs many timeswhen information is output which the user wishes to interrupt. Barge-indialogue systems which make the user's interruption of a running systemoutput possible, on the other hand, are considerably more natural intheir behavior. In addition, they are also more comfortable to the user,because the user always has a possibility to intervene and need not waitfor the end of a prompt and as a rule also reaches the position in thedialogue routine earlier where the desired information is output.

To guarantee that a speech signal of the user is recognized at any time,which is necessary for a barge-in dialogue system, there are variouspossibilities:

One possibility consists of the fact that to each access channel ispermanently assigned its own speech processing unit. With a large numberof access channels this leads to an accordingly large number of speechrecognition units. Since the system does not influence on which of theseaccess channels the associated speech recognition units aresimultaneously needed, this may lead to an extraordinary load of theservers at a certain time. In order to guarantee that the dialoguesystem can still work reasonably fast in such situations, the computingpower of the individual servers is to be designed sufficiently large, sothat all the speech recognition units located on the server can worksimultaneously without any problem.

A further possibility of producing a barge-in dialogue system for aplurality of users consists of utilizing speech activity detectors(SADs) in which exactly one such speech activity detector is assigned toeach access channel. A detection of the speech activity is practicalanyway in barge-in dialogue systems, so that the system can immediatelyinterrupt a running system output if the user gives an input speechsignal. Otherwise the user and the dialogue system would “speak”simultaneously, which may lead to irritation on the side of the userand, on the other hand, —due to the echo of the system output in theinput signal—could complicate the recognition of the user's speechsignal by the speech recognition unit. These speech activity detectorsmay be implemented by a simple energy detection of the access channelwhich requires only relatively little computing power. Subsequently,without any problem in a 1:1 assignment, one SAD can be renderedavailable for each access channel, which SAD is implemented togetherwith the associated access channel on the respective front-end computerunit. Analogous to the dialogue system that cannot be barged-inmentioned above, such a system architecture allows the assignment of aspeech recognition unit to an access channel always when a speechrecognition unit is necessary on the respective access channel.Accordingly, it is possible without any problem in such a system to heedas even a server load as possible when the speech recognition units areassigned to the access channels. Especially with larger systemscomprising very many channels and very many speech recognition units itis furthermore possible, due to the statistically low probability that aspeech recognition unit is required on all access channels at the sametime, that the number of available speech recognition units is lowerthan the number of access channels.

A great disadvantage of such a system, however, is the fact that betweenthe detection of the speech by the SAD and the actual physicalassignment of the access channel to a speech recognition unit takes sometime in which the user goes on talking. Therefore it is necessary forthe user's speech signal i.e. a large number of audio data, to bebuffered first and then switched to the speech recognition unit as soonas the latter is ready to operate. Such a buffering of the audio datais, on the one hand, expensive and thus cost-intensive. On the otherhand, it reduces the efficiency of the system.

It is an object of the present invention to provide a method formulti-user operation of a barge-in dialogue system or provide arespective barge-in dialogue system, which is always rapidly capable ofprocessing an incoming speech signal of the user in a simple mannerwhile the total computing power required by the system is minimized.

This object is achieved in that in a dialogue system which comprises oneor more front-end computer units with a plurality of access channels forthe users and a plurality of servers with a respective number of speechprocessing units comprising each a speech activity detector and a speechrecognition unit, at various specific times repeatedly a new speechprocessing unit on one of the servers is assigned to the access channelof the front-end computer unit utilized by a user during a dialogue withthe user, so that the servers are loaded as evenly as possible and thespeech activity detector detects a speech signal coming in on thecurrently assigned access channel and activates the speech recognitionunit. Depending on the device, the object is achieved by a barge-indialogue system with a corresponding number of speech processing unitsarranged on several servers comprising each a speech recognition unitand a speech activity detector for detecting an incoming speech signal,and activation of the speech recognition unit, and comprising an accessco-ordination unit which repeatedly during a dialogue with a user atvarious specific times assigns a new speech processing unit on one ofthe servers to the front-end computer unit access channel used by theuser, so that the servers are loaded as evenly as possible. Thedependent claims respectively contain highly advantageous embodimentsand further aspects of the invention.

According to the invention there are speech processing units on theservers which units comprise, on the one hand, a speech activitydetector and, on the other hand, a speech recognition unit, that is tosay, the speech activity detector which detects an incoming speechsignal and activates the speech recognition unit forms in combinationwith the speech recognition unit a speech processing unit. The speechactivity detector and the speech recognition unit may actually beseparate units which are combined i.e. grouped to one speech processingunit. However, it is alternatively possible for the speech activitydetector and the speech recognition unit to be integrated into a speechprocessing unit so that they can be considered separate operating modesof the speech processing unit and utilize, for example, common softwareroutines or memory areas etc.

The barge-in dialogue system is operated according to the invention sothat repeatedly during a dialogue With a user at various specific timesa new speech processing unit on one of the servers is assigned to therespective access channel used by the user of the front-end computerunit. This new assignment is made so that the servers are loaded asevenly as possible. This means that there is a permanent reassignment ofthe speech processing units to the active access channels while theinstants for the reassignment of a speech processing unit to a certainaccess channel of the system are determined such that there is a slimchance for a speech processing unit to be needed particularly during thereassignment to the respective access channel.

A barge-in dialogue system according to the invention consequently needsto have a suitable access co-ordination unit (Resource Manager) whichrepeatedly assigns the speech processing units of the various servers tothe respective access channels at the desired times so that a uniformload of the servers is guaranteed.

The grouping of the speech activity detectors and the speech recognitionunits on the servers to said speech processing units is advantageous, onthe one hand, in that the front-end computer units are not loaded byspeech activity detectors. Audio data streams arriving at a certainspeech activity detector can directly be processed by the associatedspeech recognition unit and need not once again be physically divertedbetween various computers, which could take up additional time and alsoa buffering of the audio data, which should be avoided at all cost.

Based on the permanent actual reassignment of the speech processingunits to the access channels and the linked equal loads of the servers,it is possible that a larger number of speech processing units islogically arranged on one server while the physical computing power ofthe servers need not be designed such that all the speech processingunits on the server can work simultaneously with fall power. It istherefore possible without any problem, despite a lower computing poweron the servers to logically arrange as many speech processing units asthere are access channels; the processing units comprising each a speechactivity detector and a speech recognition unit.

Preferably even an equal number of speech processing units can berendered available as access channels to thus reach a higher flexibilityin case of a reassignment of a speech processing unit to an accesschannel. The advantage of such “overcapacity” of speech processing unitsshows particularly when very many users simultaneously utilize thedialogue system at a certain instant and substantially all accesschannels are seized so that, as a result, a large part of the speechprocessing units have already been assigned to an access channel. As arule, however, only with part of the speech processing units the speechrecognition unit is active at this particular instant, which speechrecognition unit utilizes more computing power from the respectiveserver. On the other hand, in a large part of the speech processingunits only the speech activity detector is active which requires onlylittle computing power. The high number of calls may lead to asituation, however, in which no speech processing unit is availableanymore in certain servers, although these servers are only slightlyloaded as regards their computing power and an assignment of an accesschannel to a speech processing unit on one of the respective serverswould be optimal per se for an even load of the servers. In the extremecase with a 1:1 assignment of access channels to speech processing unitsand with a full utilization of all the access channels by as many users,no reassignment would be possible anyway. However, if more speechprocessing units than there are access channels are logically arrangedon the servers, always at least one reassignment will be possible while,with an increasing number of spare speech processing units, it is morelikely that at any time on each one of the servers at least still onenon-seized speech processing unit is available to carry out at any timean assignment that is optimal with respect to the server load.

The dialogue system is preferably operated or the assignment is made sothat to each active access channel, over which a dialogue between thesystem and the user takes place, in essence permanently one of thespeech processing units is assigned. This means that to each of theaccess channels during the dialogue—i.e. with the exception of briefmoments in which a reassignment of the speech processing unit to therespective access channel is made—one of the speech processing units isnearly constantly available while they are usually constantly changingspeech processing emits. As far as there are certain deliberatelyprovided times in a dialogue routine in which times, for example, aninterruption of a system output is undesired, obviously no speechprocessing unit needs to be assigned to the respective access channelduring these times.

In a highly advantageous example of embodiment the system comprisesmeans for signaling to the access co-ordination unit when a recognitionof a speech signal of the speech recognition unit previously havingentered an access channel was terminated and/or when a new system outputto the user can commence over this access channel. This may be effected,for example, by a signal of the speech processing unit itself whichannounces that the recognition has been terminated. Alternatively, arespective signal may also come from the dialogue control which hasreceived the necessary information from the speech recognition unit andnow continues a dialogue in accordance with the received speech signalof the user and causes a system output to be given to the user. Thereassignment of the speech processing unit to the respective accesschannel may then preferably be effected immediately after therecognition of the speech signal or within a predefined brief period oftime at the beginning of the next system output to the user. This is ahighly suitable time space for reassignment because, typically, a systemoutput is not interrupted by the user during the first couple ofmilliseconds and thus at this instant no speech recognizer on the accesschannel is probably necessary. In this way it is guaranteed thatsubstantially always when a speech recognizer could be used, thisrecognizer is immediately available. The probability that audio data aresometimes to be buffered may therefore be neglected.

Since according to the invention the speech activity detectors are notused in the front-end computer unit it is not necessary for speechdetection to lead the audio data streams through the processor of thefront-end computer unit. As a result, the audio data are preferablyconveyed by the access channel to the currently assigned speechprocessing unit without the data being led through the processor. Thisis possible in that a purely hardware circuit, for example, a so-termedswitch matrix is used for conveying the audio data streams from theaccess channel to the servers. Since the processor, which would cause abottleneck for the audio data streams, is completely bypassed in thismanner, considerably more channels can be reached in the respectivefront-end computer unit with such a hardware solution. In this way, withsuch hardware solution it is possible without any problem for example toprovide 500 to 1000 or more access channels in a system in which about120 access channels could be implemented via a software solution,.

In the selection method with which a speech processing unit is selectedfor an access channel to obtain an even load of the servers in case ofreassignment, the known selection methods of the non-barge-in systemscan be reverted to.

For example, the method known as round-Robin can be used in which achange is cyclically made from one server to the next. This method ispossible at extremely low cost. However, an even load is reached only onthe basis of a statistically assumed uniform, so that in individualcases temporarily also a relatively non-uniform load may arise.

A similar method is a so-called Least-Use method in which always thecomputer is chosen that was not used last.

A slightly more expensive but reliable method with respect to the evenload is the so-called Load Balancing Method in which always the serverhaving the currently smallest load is chosen. This method is thepreferred method because also in extreme cases an even load can bereached. For this purpose the system preferably includes means fordetermining the load values for the individual speech processing unitsor servers, respectively, and to deliver these load values to the accessco-ordination unit which then, based on the load values of theindividual units or servers, makes a decision about the reassignment ofa speech processing unit to an access channel.

The invention will be further described in the following with referenceto the appended Figure with the aid of an example of embodiment. Thesole Figure here shows a coarsely diagrammatic block diagram of abarge-in dialogue system 1 according to the invention with only thearrangement of the components essential to the invention beingrepresented.

This barge-in dialogue system 1 comprises, in essence, a front-endcomputer unit 2 and a plurality of servers 18, 19, 20, 21. The front-endcomputer unit 2 has access channels 6 for the users. In the presentexample of embodiment the access channels 6 are telephone accesschannels, for example, ISDN channels. On the servers 18, 19, 20, 21 arelocated a respective plurality of speech processing units 22. Each ofthe speech processing units 22 contains a speech activity detector 23and a speech recognition unit 24.

The example of embodiment shown has more speech processing units 22 thanthere are access channels 6 on the front-end computer unit 2. In thepresent case the dialogue system 1 has only eight access channels 6 forclarity. In contrast, the dialogue system 1 here has four servers 18,19, 20, 21 on which are logically arranged three respective speechprocessing units 22. This means that for the eight access channels 6there are twelve speech processing units 22 available. The dialoguesystem 1 may, however, also have fewer servers or a considerably largernumber of servers, while also the number of speech processing units 22per server 18, 19, 20, 21 is random and is limited only by the computingpower and the storage capacity of the respective servers 18, 19, 20, 21.The servers 18 to 21 may also have different computing powers anddifferent numbers of speech processing units 22.

In reality a front-end computer unit 22 customarily has a considerablyhigher number of access channels 6, for example, 120, 500 or even 1000and more access channels. In a real dialogue system with a front-endcomputer unit with 120 access channels for example, twelve speechprocessing units may then accordingly be located on ten servers, so thatall in all at least again one speech processing unit is available foreach access channel.

The front-end computer unit 2 is connected to the servers 18, 19, 20, 21via suitable audio data lines 25. In the Figure only one audio datachannel 25 is shown per server 18, 19, 20, 21. However, it is alsopossible to have more audio data channels 25 per server 18, 19, 20, 21,for example one audio data channel 25 per speech processing unit 22 tobe able to provide fast transmission of the audio data for each speechprocessing unit 22 over its own channel 25.

In the front-end computer unit 2 there is a dialogue control for each ofthe access channels 6, which dialog control controls a dialogue with theuser taking place over the respective access channel, as well as asuitable speech output unit for system outputs to the user. These unitsare not shown for clarity.

Since it is a dialogue system capable of barging in, always one speechprocessing unit 22 is to be available to the respective access channel 6during the dialogue with the user, to be able to process i.e. recognizethe information from the speech signal immediately upon receipt of aspeech signal. For this reason a speech processing unit 22 on one of theservers 18, 19, 20, 21 is assigned to each one of the access channels 6,the moment a dialogue with a user commences over this access channel,.The audio data arriving over the access channel 6 are directly conveyedby the front-end computer unit 2 over the audio data channels 25 to thecurrently assigned speech processing unit 22 or to the respectiveservers 18, 19, 20, 21 on which the speech processing unit 22 islocated.

The audio data first reach a speech activity detector 23 in the speechprocessing unit 22, which is active all the time and quasi “listens in”whether a speech signal of the user arrives at the access channel 6currently assigned to the speech processing unit 22. This “listening-in”of the speech processing unit 22 or speech activity detector 23,respectively, costs only little computing power. Once the speechactivity detector 23 has detected a speech signal, the speechrecognition unit 24 is activated, so that it can immediately begin withthe recognition of the speech signal. It is then not necessary to divertthe audio data stream once again from one computer unit to another,particularly the need for buffering audio data is then cancelled. Sincea speech recognition unit 24 is not activated until a speech signal isdetected by the speech activity detector 23, the necessary computingpower of a speech processing unit 22 is relatively low during a largepart of the dialogue.

According to the invention one and the same speech processing unit 22 isnot permanently assigned to the respective access channel 6 during adialogue with a user, but, repeatedly in the course of the runningdialogue, at specific different times a new speech processing unit 22available then, i.e. not used by another access channel 6, is assignedto the respective access channel 6.

This assignment takes place always when a recognition of a speech signalinput by the user is terminated, or in a very brief time frame after anew prompt to the respective user. At this time it need not be expectedthat the user interrupts the dialogue system to input a new speechcommand. Normally there is an interruption by the user only a couple ofmilliseconds after the beginning of a prompt at the earliest. In thismanner it is provided that a reassignment of the individual speechprocessing units 22 to the then active access channels 6 is madepermanently, without this being noticeable to the users, for example, bylonger reaction times of the dialogue system.

To avoid that a system output runs on although the user has alreadyreplied to the dialogue system and input a speech signal himself, thespeech activity detector 23 further sends for example over a local areanetwork link 5 or a similar data channel via which the servers 18 to 21are connected to the front-end computer unit 2, a respective signal tothe dialogue control that serves the access channel 6. This dialoguecontrol then interrupts the current system output.

The assignment of the speech processing units 22 on the various servers18 to 21 to the respective active access channel 6 is effected by meansof an access co-ordination unit (Resource Manager) 3 which is located onthe front-end computer unit 2. This access co-ordination unit 3comprises a so-termed speech matrix 4 which purely as hardware switchesthe access channels 6 with the audio data channels 25 to the desiredspeech processing units 22. This hardware implementation of the switchhas the advantage that the processor of the front-end computer unit isnot loaded by the audio data.

Since also the speech activity detectors 23 are located directly on theservers 18, 19, 20 and 21 in the speech recognition units 22 and not inthe front-end computer unit 2, it is therefore not necessary at all forthe audio data arriving over an access channel 6 to be led through aprocessor of the front-end computer unit 2 in the described embodimentof the invention, which computer unit 2 would present a bottleneck forthe audio data stream, thereby reducing the efficiency of the wholesystem.

When a new speech processing unit 22 is assigned to an active accesschannel 6, the access co-ordination unit 3 provides that the individualservers 18, 19, 20, 21 are loaded as evenly as possible as regards therequired computing power and the current storage requirement. For thispurpose, standardized values of capacity utilization are transmittedfrom the individual servers 18, 19, 20, 21, for example via the localarea network link 5 to the access co-ordinating unit 3 in the front-endcomputer unit 2 on the basis of which capacity utilization values theaccess co-ordination unit 3 can detect the load of the individualservers 18, 19, 20, 21. Based on these load values the reassignment isthen made in that the values may be adjusted. This way of proceedingwill once again be explained in the following with the aid of a “randomindication” during the operation of the barge-in dialogue system 1.

To this end, it is assumed that at a particular instant a user is servedover all eight access channels 6 i.e. all access channels 6 are active.The dialogue runs on the access channels 6 are then completelyindependent of each other. This means that at a certain instant systemoutputs are made on several of the access channels 6, whereas the userutters a speech signal on other ones of the access channels 6, i.e. aspeech signal arrives. Depending on whether a speech signal has to beprocessed or not, different computing power is required from the speechprocessing unit 22 then assigned to the respective active accesschannel, which puts a different load on the respective servers 18, 19,20, 21.

It is furthermore assumed that the current assignment of the speechprocessing units 22 to the access channels 6 at a specific instanthappens to be so that two of the speech processing units 22 from each ofthe four servers 18, 19, 20, 21 are assigned to one access channel 6,whereas the third speech processing unit 22 is not seized yet. It isfurther assumed that at a particular instant in one of the accesschannels 6 a recognition of a speech signal input by the user has takenplace and a prompt is issued to the user. Simultaneously, the accessco-ordination unit 3 establishes with the aid of the utilization valuesthat the server 18 on which the speech processing unit 22 currentlyassigned to this access channel 6 is located, has a relatively highdegree of utilization because just on the other access channel 6, whichis assigned to the second speech processing unit 22 of the same server18, the user enters the speech signal which is processed by the speechrecognition unit 24 of this speech processing unit 22. On the otherhand, another server 19 from the four servers 18, 19, 20, 21 hasrelatively low utilization because system outputs take place here on thetwo associated, currently assigned access channels 6 and the user entersno more speech signals then. The two remaining servers 20, 21, on theother hand, have an average level of utilization because here too one ofthe speech processing units 22 is busy recognizing a speech signal. Theaccess co-ordination unit 6 on the front-end computer 2 will thereforetake the opportunity to assign a new speech processing unit 22 to theaccess channel 6 on which the prompt is just being outputted, so as tounload the server 18 on which the speech processing unit 22 currentlyassigned to the respective access channel 6 is located. Based on theutilization value the third, free speech processing unit 22 on theserver is selected that has the least load at the time.

Since the user's speech inputs are permanently recognized andsubsequently prompts issued during a dialogue, there are numerousopportunities during a dialogue to assign a new speech processing unit22 to the access channel 6 on which the dialogue is held. As a result ofthe frequent reassignment of the speech processing units 22 to theaccess channels 6 it is possible to observe a very even loading of allthe servers, so that despite a large number of speech processing unitslogically arranged on the servers, the total computing power of theservers may be reduced. Based on the suitable selection of instants ofreassignment it need not be feared that at an instant at which an accesschannel needs a speech processing unit, this unit is currentlyunavailable. All in all the invention thus makes an effectivedistribution of speech activity detectors and speech recognition unitspossible over a large number of servers within a network, an efficientdistribution of these resources being given even in dialogueapplications that are capable of barging in. Furthermore, the systemcomplexity in the front-end computer unit may be kept very small, sothat an efficient distribution of audio data to the individual speechrecognition units even purely by hardware becomes possible. However, itis pointed out that the invention is also meaningful in those caseswhere the front-end computer units distribute the audio data by means ofsuitable software, for example, all utilizing their main processor.Since the main processor in such a case is relatively heavily loaded bythe distributions anyway, the advantage is highly noticeable that thespeech activity detectors are arranged on the servers and do not form anadditional load on the main processor.

It is once more expressly stated that the example of embodiment shown isonly a possibility of implementing the system. More particularly it isalso possible for such a dialogue system to have a plurality offront-end computer units 2 which then in their turn for example containa plurality of access channels. Similarly, for each access channel maybeused its own front-end computer unit. An example of this is a dialoguesystem in which the respective PC of a user itself forms the front-endcomputer unit, while for example the dialogue control for the respectiveapplication is located on this PC and the access to the servers with thespeech processing units is effected via an Internet connection. Thesefront-end computer units could then be connected, for example, to acentral computer unit which, in essence, only functions as a switchingcenter and, for example, has the resource manager and a respectiveswitch matrix.

1. A method of operating a barge-in dialogue system (1) for parallel useby a plurality of users, which dialogue system comprises one or morefront-end computer units (2) having a plurality of access channels (6)for the users and a plurality of servers (18, 19, 20, 21) with arespective number of speech processing units (22) which comprise each aspeech activity detector (23) and a speech recognition unit (24), whererepeatedly during a dialogue with a user, at various specific times anew speech processing unit (22) on one of the servers (18, 19, 20, 21)is assigned to the access channel (6) of the front-end computer unit (2)utilized by the user so that the servers (18, 19, 20, 21) are loaded asevenly as possible and the speech activity detector (23) detects aspeech signal coming in on the currently assigned access channel andactivates the speech recognition unit (24).
 2. A method as claimed inclaim 1, characterized in that the reassignment of a speech processingunit (22) to an access channel (6) takes place immediately after arecognition of a speech signal entered by the user or within apredefined short period of time at the beginning of a system output tothe user.
 3. A method as claimed in claim 1 or 2, characterized in thatto each of the access channels (6) in essence permanently during adialogue with a user a speech processing unit (22) is assigned.
 4. Amethod as claimed in one of the claims 1 to 3, characterized in that forthe individual servers (18, 19, 20, 21) always a load value isdetermined and an assignment takes place for which the load values ofthe individual servers (18, 19, 20, 21) are used.
 5. A method as claimedin one of the claims 1 to 4, characterized in that the assignment of aspeech processing unit (22) to an access channel (6) is made by means ofa hardware circuit (4) which conveys audio data entering the respectiveaccess channel (6) directly to the server (18, 19, 20, 21) with therespective speech processing unit (22).
 6. A barge-in dialogue systemfor parallel use by a plurality of users, comprising one or morefront-end computer units (2) having a plurality of access channels (6)for the users, a plurality of servers (18, 19, 20, 21) which compriseeach a number of speech processing units (22) with a respective speechrecognition unit (24) and a speech activity detector (23) for detectingan incoming speech signal and activating the speech recognition unit(24), and an access co-ordination unit (3) which repeatedly during adialogue with the user, at various specific times assigns to theuser-deployed access channel (6) of the front-end computer unit (2) anew speech processing unit (22) on one of the servers (18, 19, 20, 21)such that the servers (18, 19, 20, 21) are loaded as evenly as possible.7. A dialogue system as claimed in claim 6, characterized by means ofsignaling to the access co-ordination unit (3) the termination of arecognition of a speech signal previously input in an access channel (6)and/or the beginning of a system output to the user via this accesschannel (6).
 8. A dialogue system as claimed in claim 6 or 7,characterized by means for determining utilization values for theindividual servers (18, 19, 20, 21) and means for transferring theseutilization values to the access co-ordination unit (3).
 9. A dialoguesystem as claimed in one of the claims 6 to 8, characterized in that theaccess co-ordination unit (3) is integrated with the front-end computerunit (2).
 10. A dialogue system as claimed in one of the claims 6 to 9,characterized by a hardware circuit (4) which conveys audio dataentering an access channel (6) directly to the servers (18, 19, 20, 21)comprising the speech processing unit (22) assigned to the respectiveaccess channel (6) at this time.